字符串和文本数据
1> lower()
将 Series/Index中的字符串转换为小写
str_list = ['meng','zhi','MI', 'w@g', np.nan, '123', 'LiXiao']
pd_str = pd.Series(str_list)
print(f'转换为小写后:\n{pd_str.str.lower()}')
# 输出结果:
# 转换为小写后:
# 0 meng
# 1 zhi
# 2 mi
# 3 w@g
# 4 NaN
# 5 123
# 6 lixiao
# dtype: object
2> upper()
将 Series/Index中的字符串转换为小写
str_list = ['meng','zhi','MI', 'w@g', np.nan, '123', 'LiXiao']
pd_str = pd.Series(str_list)
print(f'转换为大写后:\n{pd_str.str.upper()}')
# 输出结果:
# 转换为大写后:
# 0 MENG
# 1 ZHI
# 2 MI
# 3 W@G
# 4 NaN
# 5 123
# 6 LIXIAO
# dtype: object
3> len()
用于计算字符串长度
str_list = ['meng','zhi','MI', 'w@g', np.nan, '123', 'LiXiao']
pd_str = pd.Series(str_list)
print(f'字符串长度:\n{pd_str.str.len()}')
# 输出结果:
# 字符串长度:
# 0 4.0
# 1 3.0
# 2 2.0
# 3 3.0
# 4 NaN
# 5 3.0
# 6 6.0
# dtype: float64
4> strip()
删除 Series/Index 两侧的每个字符串中的空格(包括换行符)
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'删除空格后:\n{pd_str.str.strip()}')
# 输出结果:
# 删除空格后:
# 0 meng
# 1 zhi
# 2 mi
# 3 Li XIao
# dtype: object
5> split()
用给定的模式拆分每个字符串
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.split()}')
# 输出结果:
# 0 [meng]
# 1 [zhi]
# 2 [mi]
# 3 [Li, XIao]
# dtype: object
6> cat()
用给定的分隔符连接 Series/Index 元素
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.cat(sep = ">=")}')
# 输出结果:
# meng>=zhi >=mi>=Li XIao
7> get_dummies()
返回具有单热编码值的 DataFrame
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.get_dummies()}')
# 输出结果:
# meng Li XIao mi zhi
# 0 1 0 0 0
# 1 0 0 0 1
# 2 0 0 1 0
# 3 0 1 0 0
8> contains()
如果元素中包含子字符串,则返回 True,否则返回 False
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.contains(" ")}')
# 输出结果:
# 0 True
# 1 True
# 2 False
# 3 True
# dtype: bool
9> replace()
将值 a替换为 值 b
str_list = [' meng', 'zhi ', 'mi', 'Li XIao']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.replace("i","b")}')
# 输出结果:
# 0 meng
# # 1 zhb
# 2 mb
# 3 Lb XIao
# dtype: object
10> repeat()
用于重复每个元素指定的次数
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.repeat(2)}')
# 输出结果:
# 0 meng meng
# 1 zhi zhi
# 2 mi mi
# 3 Li XIao Li XIao
# dtype: object
11> count()
返回模式中每个元素出现的总数
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'统计计算的次数:\n{pd_str.str.count("i")}')
# 输出结果:
# 统计计算的次数:
# 0 0
# 1 1
# 2 1
# 3 1
# dtype: int64
12> startswith()
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.startswith("m")}')
# 输出结果:
# 0 False
# 1 False
# 2 True
# 3 False
# dtype: bool
13> endswith()
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.endswith("g")}')
# 输出结果:
# 0 True
# 1 False
# 2 False
# 3 False
# dtype: bool
14> find()
返回第一次出现的位置
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.find("i")}')
# 输出结果:
# 0 -1
# 1 2
# # 2 1
# 3 1
# dtype: int64
15> findall()
返回出现的所有列表
str_list = [' meng', 'zhi ', 'mi ', 'Li XIao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.findall("i")}')
# 输出结果:
# 0 []
# 1 [i]
# 2 [i]
# 3 [i]
# dtype: object
16> swapcase()
转换大小写
str_list = [' meng', 'Zhi ', 'Mi ', 'Li Xiao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.swapcase()}')
# 输出结果:
# 0 MENG
# 1 zHI
# 2 mI
# 3 lI xIAO
# dtype: object
17> islower()
检查 Series/Index 中每个字符串的所有字符是否是小写形式,返回布尔值
str_list = [' meng', 'Zhi ', 'Mi ', 'Li Xiao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.islower()}')
# 输出结果:
0 True
1 False
2 False
3 False
dtype: bool
18> isupper()
检查 Series/Index 中每个字符串的所有字符是否是大写形式,返回布尔值
str_list = [' meng', 'Zhi ', 'Mi ', 'Li Xiao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.isupper()}')
# 输出结果:
# 0 False
# 1 False
# 2 False
# 3 False
# dtype: bool
19> isnumeric()
检查 Series/Index 中每个字符串的所有字符是否为数字,返回布尔值
str_list = ['123', 'Zhi ', 'Mi ', 'Li Xiao ']
pd_str = pd.Series(str_list)
print(f'{pd_str.str.isnumeric()}')
# 输出结果:
# 0 True
# 1 False
# 2 False
# 3 False
# dtype: bool